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Abstract — We consider coding schemes for channels with non- 
uniform inputs (NUI), where standard linear block codes can not 
be applied directly. We show that multilevel coding (MLC) with 
a set of linear codes and a deterministic mapper can achieve the 
information rate of the channel with NUI. The mapper, however, 
does not have to be one-to-one. As an application of the proposed 
MLC scheme, we present a rateless transmission scheme over the 
binary symmetric channel (BSC). 

I. Introduction 

Consider a discrete memoryless channel (DMC) with input 
X € X, output Y G y and let Px be the input distribution. 
When Px is the uniform distribution on X, denoted by 
unif (X), it is well known that linear codes can be directly used 
to achieve the information rate corresponding to Px [1]. When 
| X — 2 m , binary linear codes along with multilevel coding 
(MLC) suffice to achieve the information rate corresponding 
to unif(A'). Given extensive recent results on designing good 
LDPC codes for binary input symmetric channels [2], [3], 
it suffices to say that LDPC codes provide a good practical 
solution to this communication problem. 

However, in many cases, we are interested in input distri- 
butions which are not the uniform distribution. This could be 
because the capacity achieving distribution is non-uniform (for 
example, the Z channel [4]). Or, other signalling constraints 
may force us to use non-uniform inputs. An example of this is 
optical channels with cross-talks, where the probability of Is 
transmitted by each userpi = P(X = 1) has to be constrained 
to be pi -C 1/2 to control the interference to other users [5]. In 
such scenarios, binary linear codes can not be applied directly 
since they can only induce the uniform distribution. We refer 
to such channels as 'channels with non-uniform inputs (NUI)'. 
The coding problem for such channels remains open [4]. 

This problem was previously studied by Ratzer and Mackay 
[5] [6], In [6], they focused on designing inverse Huffmann 
code type mappers to induce the desired distribution. However, 
soft output decoding of the Huffman code is usually computa- 
tionally complex and, further, the variable length nature of the 
mapping may incur catastrophic decoding errors. Alternatively, 
in [5], LDPC codes over GF(q) with deterministic mappers 
were used to induce the desired non-uniform distribution. The 
main drawback of this scheme is that the decoding complexity 
for the nonbinary LDPC code is significantly larger and the 
code optimization is very complicated. 



In this paper, we first show that MLC using a set of binary 
linear codes and a deterministic mapper suffices to achieve 
the information rate of the channel with NUI. The mapper, 
however, does not necessarily have to be one-to-one. This 
scheme, discussed in Section UTI is shown to be optimal when 
the channel law is known at the transmitter. Although an 
MLC scheme with binary inputs can only induce dyadic input 
distributions, it is shown that via proper time sharing, the 
proposed MLC with a small number of layers can get close to 
the channel information rate for an arbitrary Px- Compared 
with the previous works, the proposed MLC scheme not only 
has low complexity, but is theoretically justifiable as well. 

As an important application of coding for channels with 
NUI, we consider the problem of rateless transmission over 
the binary symmetric channel (BSC). In [7], a simple layering, 
dithering (or interleaving) and repeating based rateless scheme 
was proposed for AWGN channel. In this paper, we extend 
(non-trivially) their results to the BSC case. Thanks to the 
degraded nature of the BSC, a similar layering scheme can 
be applied without a rate loss. However, in order to perform 
layering, the number of Is of the coded bits in each layer must 
be constrained. This is precisely where the proposed MLC 
scheme can be applied. We further show that repeating does 
not incur a rate loss in the low rate region over the BSC even 
for non-uniform inputs. Therefore, rateless transmission over 
the BSC becomes possible by simply layering, interleaving 
and repeating the proposed MLC block. 

II. Coding for Channels with Non-Uniform Inputs 

The problem of coding for channels with NUI can be dated 
back to Gallager. In [8], Gallager showed that binary linear 
codes can achieve the capacity of any DMC: 

Theorem 1 Binary linear codes can be used to achieve the 
capacity of an arbitrary discrete memoryless channel. 

We refer interested readers to [8] for the detailed proof. 
The main result of this theorem says that for any DMC, 
capacity can be achieved by a set of linear codes with a 
deterministic mapper under maximum likelihood decoding 
(MLD). However, as suggested by Gallager, finding practical 
decoding algorithms is a nontrivial problem. Note that in 
Gallager's proof, the key component, a deterministic mapper is 
used to induce the desired channel input distribution to achieve 
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Fig. 1. An example of a deterministic mapper 
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Fig. 2. System Model for the Proposed Coding Scheme for Constrained 

the capacity of the DMC. The deterministic mapper can be Input Channels 
defined as follows: 



Definition 1 A deterministic mapping is a function f : W — > 
X, where W € {0, l} m and X E X (X is the set of all 
possible channel input symbols). 

For example, consider the channel with NUI, i.e., pi — 
P(X = 1) = 1/4 and p = P(X = 0) = 3/4. A possible 
deterministic mapper is shown in Figure^ where W £ {0, l} 2 
and X S {0, 1}. Since W is uniformly distributed, the mapper 
can induce the desired distribution on X. Note that using 
linear codes with a deterministic mapper, we can only obtain 
probabilities of the form k/2 m . However, by increasing m, 
we can approximate the desired distribution arbitrarily well. 

Proposed Scheme: We first propose an MLC scheme to 
achieve the information rate of channels with NUI. The 
diagram of the proposed MLC scheme is shown in Figure 
13 and the details of the scheme is as follows: Encoding: 
In each layer, Wi is encoded using a capacity achieving 
binary linear code. The code rate of the i th layer is selected 
to be R t = I(Wi]Y\Wx,--- Then we induce the 

desired distribution on X from W — \W\, • • • , W m ] using a 
deterministic mapper as suggested in Theorem ^ Decoding: 
At the decoder, we apply MSD: W\ is first decoded and then 
W<z is decoded based on Y and the decision of W\ and so 
forth until W m is decoded based on Y and all the decisions 
from Wi to W m -\. 

Now, we show that the information rate of a channel with 
NUI can be achieved by the above MLC scheme. 

Theorem 2 The proposed coding scheme can achieve 
the information rate of the DMC with NUI, i.e., 
T^ =1 I(W i ;Y\W 1 ,-" ,W^) = I(X;Y). 

Proof: We first show that the deterministic mapping from 
W to X does not incur a rate loss. Note that W — > X — > Y 
forms a Markov chain. Expanding I(W;Y, X) in two ways, 
we have: 

I(W;Y,X) = I(W;X) + I(W;Y\X) (1) 
= I[W;Y) + I(W;X\Y) (2) 



Due to the Markovian structure I(W; Y\X) — 0, thus, the 
mutual information between W and Y can be written as: 

I(W;Y) = I(W;X)-I(W;X\Y) (3) 
= (H(X) - H{X\W)) - (H(X\Y) - H(X\W, Y)) 

(4) 

= H(X)-H(X\Y) = I{X-Y) (5) 

Q to ^ follows by the fact that A" is a function of W. 

Hence, we can achieve the information rate using the 
proposed MLC. Since W = [Wi, W2, ■■■ , W m ], the mapping 
from Wi , W2 , • ■ • , W m to W is a bijection. According to the 
chain rule of mutual information, we have: 

m 

I(X;Y)=I{W;Y)=J2 I {W i ;Y\W u --- ,Wi-!) (6) 

■ 

The proof generalizes the MLC proof in [9], where the 
mapping from W to X is a bijection. However, here, W does 
not need to be a deterministic function of X , which suggests 
that the one-to-one type mappers (e.g., inverse Huffman code 
[6]) are not required in order to achieve the information rate. 
Essentially, what is needed is a deterministic mapper, which 
shapes the uniform distribution obtained from the coded bits 
of the linear codes to be the desired channel NUI distribution. 
Besides, the theorem implies that in each layer, standard binary 
linear codes, such as binary LDPC codes suffice. 

Example 1: We give an example of the proposed MLC 
scheme over a BSC with NUI. In Figure|3] the information rate 
is plotted as a function of the channel crossover probability 
h. The probability of Is at the channel input is fixed to be 
pi — 1/4. We can see that the proposed MLC can achieve 
the information rate supported by the channel. In contrast, 
time sharing a linear code with 0s will incur a significant rate 
loss. Besides, note that the mapper will introduce memory 
across the layers. Therefore, bit-interleaved-coded-modulation 
(BICM) without iterative demodulation (the demapper gener- 
ates bit-level soft information for each layer by ignoring the 
correlation across the layers) also incurs a significant rate loss. 
For Z channels, similar phenomenon is observed. Due to the 
page limit, the results are not shown here. 
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Fig. 3. Achievable Rate of Different Schemes over a BSC with pi = 1/4 



The input probabilities that can be induced are of the form 
k/2 m , i.e., dyadic fractional numbers. However, any desired 
input probability p\ can be approached by properly time 
sharing between two MLC schemes. For instance, if we need 
Pi = 2/5 at the channel input, we can time share between 
two MLC schemes with 2 layers, one with pi = 1/4 and the 
other with p\ = 1/2. The rate loss of the proposed MLC time 
sharing scheme is usually small. 

Example 2: Consider the proposed MLC over for a BSC 
with NUI in Figure |4] In this case, the channel crossover 
probability is fixed h = 0.3 and the plot shows the change 
of the information rate as a function of p\. We can see that 
the simple scheme that time shares a linear code with 0s will 
incur a substantial loss. To get close to the information rate, 
we may time share between two of the MLC schemes with 
to = 3. As shown in Figure^ for any fc/8 <p\ < (k + 1)/8, 
time sharing between the MLC scheme with p\ = k/8 and 
Pi = (k + l)/8 achieves most of the information rate. 

III. Rateless Transmission Scheme over Binary 
Symmetric Channel using Layering and Repeating 

In this section, we present an application of the proposed 
scheme to rateless transmission over the BSC, which is based 
on layering, interleaving and repeating. We show that in 
the low rate region, repeating preserves information rate and 
therefore it can be used as a simple rateless scheme, which 
extends the result of [7] to the BSC case. We then show 
that layering information does not incur a rate loss for the 
BSC due to its degraded nature. As a result, in order to form 
layering, the coded bits in each layer have to be non-uniformly 
distributed, where the proposed MLC scheme in Section [H] 
becomes useful. 

The problem of rateless transmission over BSC can be 
formulated as follows: suppose we want to communicate 
over a BSC with an unknown but lower bounded crossover 
probability h > h m i n . The capacity of this channel is bounded 
by < C < 1 — H(h m in)- Since h is unknown, the transmitter 
will first send a mother code of rate R m ax = ^~H(h m i n ) and 
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Fig. 4. Information Rate of a 3-Layer MLC Scheme over a BSC with h = 
0.3 



then keep sending extra redundancies until the receiver gets 
enough information to decode. The mother code together with 
any of the redundant suffix should be a good code such that as 
long as right enough redundancies are collected, the receiver 
will be able to decode. Therefore, the proposed scheme is 
called rateless, since it can work for a wide range of rates 
(0 < R < 1 — H(h min )). For details on rateless codes over 
the binary erasure channel (BEC), we refer interested readers 
to [10][11]. 

Here, we propose a rateless transmission scheme over 
the BSC based on layering, interleaving and repeating. The 
structure of the rateless scheme over the BSC is shown in 
Figure |5] 

Proposed Scheme: Encoding: In Block 1, each of the i th 
layer encodes its message into coded bits obeying {pj, 1— pi} 
Bernoulli distribution. The initial code rate of the i th layer 
Ri should be selected such that R ma x = Y^i=i Ri = 1 — 
H(h m i n ). Then, the coded bits from each layer are interleaved, 
since we have the NUI distribution, interleaving rather than 
dithering has to be used to make sure that interference from 
other layers does not combine coherently when we combine 
the repeated blocks (See [7] for details). All the interleaved 
layers are then stacked, i.e., bit wise XORed together X a u = 
Y27=i ®Xi and transmitted through the BSC. If the receiver 
is not able to decode, Block 2 is sent. That is all the coded 
bits of each layer are repeated, interleaved using a different 
set of interleavers, then stacked together and transmitted 
through the channel again. The above procedure continues 
until the receiver has got enough repeated blocks to decode, 
i.e., mI(X aU ;Y aU ) > R max . Decoding: At the decoder, we 
first wait until enough number of blocks are collected. Then 
we apply MSD. In the n th layer, we first generate the soft 
information of each coded bit from the repeated channel 
outputs and use them to decode the n th layer's codeword. Then 
the decoded bits are subtracted and the (n — l) th layer sees a 
clearer channel. We repeat the above decoding procedure until 
the I s * layer is decoded. Eventually, the information rate after 
m-time repetition is R = Rmax/fn. 



MLC Scheme o 



MLC Scheme i 
{p. If i) 



MLC Scheme 1 
lp„ lp >} 



Block 2 Block 1 

Fig. 5. the Proposed Rateless Scheme over the BSC 



As a prerequisite to show the optimality of the proposed 
scheme, we first give the following lemma: 

Lemma 1 Suppose a and b are constants, the function f{x) = 
log(6 + ax) satisfies the following inequalities as x — > 

log&+ yx - l-(yx) 2 < log{b + ax) < logb+yx (7) 
o 2 o o 

Proof: This lemma is immediate by considering the 
Taylor series expansion of f(x) near x = 0. ■ 
It is shown in [12] that for channels with uniform input 
distribution, repeating preserves information rate, in the low 
rate region. Here we extend this result to the BSC with NUI. 

Theorem 3 Let X be the channel input satisfying Bernoulli 
distribution {p, 1 — p}. X is transmitted through a BSC with 
crossover probability h. Let Y be the channel output and Y m 
be the m-time repetition of X through the BSC. When p — > 0, 
the information rate is preserved by repeating X m times, i.e., 
lim p ^o I(X; Y m ) = lim^n mI(X; Y). 

Proof: The channel output Y will obey {p y , 1 — p y } 
Bernoulli distribution, where p y — p®h = p(l — h) + (l—p)h. 
We have the information rate: 



I(X;Y) =H(j>®h)-H{h) 
As p goes to zero, we have: 



(8) 



lim mI(X; Y) = lim mp(l - 2h) log — + o(p) (9) 

On the other hand, we can derive the information combining 
of a repetition code over the BSC with NUI. Note that Y m 
can be viewed as a vector channel output, there are (m + 1) 
types of channel outputs. Different outputs of the same type 
are just different permutations and are statistically equivalent. 
We have the probability of the i fh type as: 



Pi =ph%l-h) m 

= P [fj(i-hy 
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Fig. 6. Degraded Nature of BSC 



Let A, = h l {l-h) m ~ l -h m - l {l-h) 1 and B t = h m - i {l- 
h) i . Thus we have pi = Aip + Bi. Since all the probabilities 
of all the channel outputs will sum up to be "1". We have: 



i=0 



E r )(A iP+ B i ) = i 



Besides, since 



E 

i=0 



Bi = (h+1- h) m = 1 



we have: 



E 

i=0 



Ai = 



(12) 



(13) 



(14) 



The mutual information in the low rate region can therefore 
be written as: 

lim I(X; Y m ) = lim V - I . ] A^log B t + log e)p + o(p) 
P ^o p ^°7^ I 1 / 

(15) 

1 - h r 



lim p log ■ 

p^o h 



E 



lim pm(l — 2h) log 

p^o h 

lim mI(X;Y) + o(p); 

p^O 



Ad 



o{p) 



o{p) 
(16) 
(17) 
(18) 



where dl5l to \\6\ follows from Lemma ^ and some straight- 
forward manipulations. From ( H8L we can see that for very 
low rate, repeating preserves information rate. ■ 
In the proposed scheme, we use layering to drive the coding 
rate of each layer to the low rate region. We can show that 
layering is lossless as follows: 

Theorem 4 For the BSC, layering does not incur any loss 
in information rate and MSD can be used to achieve the 
information rate. 

Proof: The channel model of information layering over 
the BSC is shown in Figure [5] Let the overall stacked 



information as X a u — J27 =1 ®Xi The overall channel output 
is Y a u = Y n . We have the following relationship: 



I(Xi,X 2 ,-'- ,X n ;Y n ) 

, I{Xi] Y n \X i+ i, ■ ■ ■ ,X n ) 



E 

n 

E 

i=l 



(19) 



I(X; Y) = J2 ( H ( Y *) - H (Yi-i)) (20) 



= H(Y n )-H(Y )=I(X aU ;Y n 



(21) 



From the above equations, (12 It suggests that stacking does not 
incur a rate loss and ( I19> suggests that the overall information 
rate can be achieved by MSD. ■ 
It remains to show that the rate loss due to repeating does 
not accumulate as the number of layers increases. Thus, as the 
number of layers goes to be large, the overall rate loss due to 
repeating is negligible. 



(|26} into O, 

we have a lower bound of the information rate 
of the m-time repetition I(Xi\ Y™) as: 

lim I(X j ;YJ n ) > lim mp(l - 2hj) log 1 7 — - cmp 2 
p^o ■> p^o hj 

(27) 

where c is constant. 

Combining i24l and i21\ the rate loss can be bounded by: 

Aj = mI{Xj- Yj) - I{Xf, y/ n ) < cmp 2 (28) 

Consequently, we have the overall information loss as: 
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A = lim A,- 



(29) 
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(30) 



Theorem 5 Using layering and repeating for rateless trans- 
mission is information lossless as long as the rate of each 
layer is sufficiently small. 

Proof: Let the total number of layers be N . For the 
jth i a y erj we h ave me channel input and output as Xj, Yj. 
The input probability pj and crossover probability hj. Let 
the probability of 1 of the overall information be p^. For 
simplicity, let each layer has pt — p. By recursion we have 
the following relationship: 



Pj =P 



1-(1-2 PJV )^ 



(22) 



(Note that p N = 1/2 - e, where < e < 1/2, since p t < 1/2. 
As N — > oo, pn can be made arbitrarily close to 1/2.) 

The information rate per m-time channel use is mI(Xj ;Yj), 
while the information rate of the m-time repeating is 
I(Xj-, YJ 71 ). We have the overall rate loss as: 



A' 



E a j 



N 

Yim o J2[mI{X 3 -Y 3 )-I{X f ,Yp)]=Q 
j=i 



A = lim 

(23) 

Note that the information rate of each layer can be made ar- 
bitrarily small such that Lemma holds. Thus the information 
rate per m-time channel use is upper bounded by: 

lim mI(Xj;Yj) < lim mp(l - 2h 3 ) log 1 ~ — (24) 

p^O p— *Q hj 

On the other hand, following Lemma ^ we have the following 
inequalities: 



log(Bi + Ai P )> log Bi + -±p 
log(l - Bi - A iP ) > log (1 - B t ) - 



-(— )V 

A, 



(25) 



1 



A, 



Note that pi < 1/2 for all the layers, i.e., each layer will 
have to have NUI distribution, since otherwise the interference 
seen by the upper layers will have crossover probability 
1/2. Thus, in order to perform the rateless transmission, we 
need codes for channels with NUI, where the proposed MLC 
scheme discussed in the previous section becomes useful. 
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Note that A{ and Bi do not depend on the number of layers 
TV and consequently they are bounded. Thus, plugging (125 \ 



